A Chinese word segmentation based on language situation in processing ambiguous words

نویسندگان

  • Mao-yuan Zhang
  • Zhengding Lu
  • Chun-yan Zou
چکیده

While the processing of natural language is beneficial to the text mining, Chinese word segmentation is an important step in the processing of Chinese natural language. In this paper, the convergence essence of the segmentation process is analyzed, and a theory of Chinese word segmentation based on language situation is deducted. Based on the segmentation theory, an algorithm of Chinese word segmentation is presented. Both in theory and from the experiment results, the algorithm is efficient. 2003 Elsevier Inc. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese Tweets Segmentation based on Morphemes

Chinese tweets segmentation is a critical problem in natural language processing area. While segmentation of in-vocabulary words is well studied to date, few research findings are yet available concerning the prediction of new words on twitter. In this paper, we attempt to exploit multiple features for segmenting tweets in real text. To this end, we first take morpheme as the basic component un...

متن کامل

Do Chinese Readers Follow the National Standard Rules for Word Segmentation during Reading?

We conducted a preliminary study to examine whether Chinese readers' spontaneous word segmentation processing is consistent with the national standard rules of word segmentation based on the Contemporary Chinese language word segmentation specification for information processing (CCLWSSIP). Participants were asked to segment Chinese sentences into individual words according to their prior knowl...

متن کامل

A Web-based Approach To Chinese Word Segmentation

Chinese text processing requires the detection of word boundaries. This is a non-trivial step because Chinese does not contain explicit whitespace between words. Existing word segmentation techniques make use of precompiled dictionaries and treebanks. The creation of dictionaries and treebanks is a labor-intensive process and consequently they are updated infrequently. Furthermore, due to their...

متن کامل

An Unsupervised Approach to Chinese Word Sense Disambiguation Based on Hownet

The research on word sense disambiguation (WSD) has great theoretical and practical significance in many fields of natural language processing (NLP). This paper presents an unsupervised approach to Chinese word sense disambiguation based on Hownet (an electronic Chinese lexical resource). In our approach, contexts that include ambiguous words are converted into vectors by means of a second-orde...

متن کامل

Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach

This paper presents a pragmatic approach to Chinese word segmentation. It differentiates from most of the previous approaches mainly in three respects. First of all, while theoretical linguists have defined Chinese words with various linguistic criteria, Chinese words in this study are defined pragmatically as segmentation units whose definition depends on how they are used and processed in rea...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Inf. Sci.

دوره 162  شماره 

صفحات  -

تاریخ انتشار 2004